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Abstract. We describe the decomposition of QSO absorption line ensembles applying an evolutionary forward modelling 
technique. The modelling is optimized using an evolution strategy (ES) based on a novel concept of completely derandomized 
self-adaption. The algorithm is described in detail. Its global optimization performance in decomposing a series of simulated 
test spectra is compared to that of classical deterministic algorithms. Our comparison demonstrates that the ES is a highly 
competitive algorithm capable to calculate the optimal decomposition without requiring any particular initialization. 
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1. Introduction 

The standard astronomical data analysis packages such as the 
Image Reduction and Analysis Facility (IRAF) and many pop- 
ular custom-built applications offer only deterministic algo- 
rithms for the purpose of parametric model fitting. In prac- 
tice, however, deterministic algorithms often require consider- 
able operational intervention by the user In contrast, stochastic 
strategies such as evolutionary algorithms minimize the opera- 
tional interaction, a highly appealing feature. 

In practice, any parametric model fitting technique reduces 
to the numerical problem of finding the minimum of an objec- 
tive function / : R" — > R, i.e. constructing a sequence of object 
parameter vectors 



(«) 



(1) 



such that lim^^oo fix'-^^) is as small as possible. Several clas- 
sical algorithms are practicable to overcome the problem of 
minimization. Among the best established strategies are the 
conjugate gradient, variable metric, and Levenberg-Marquardt 
algorithms, all gathering information about the local topogra- 
phy of / by calculating its partial derivatives (i.e. the gradient 
or the Hessian matrix) and there by ensuring the ra pid conver- 
gence at a close minimum (e.g.. |Press et~al1l2002l Numerical 
Recipes). The nature of these classical algorithms is determin- 
istic: each member of the sequence {x^^^)g is determined by its 
predecessor, i.e. the whole sequence is determined by the in- 
tial object parameter vector jc*"*. Exactly this fact turns out to 
be the cardinal deficiency in case / exhibits many local min- 
ima: The success of the algorithms in locating the global mini- 
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mum of / crucially depends on the adequacy of the initial guess 
jc*'^'. Hence, the results are biased and several optimization runs 
are normally necessary. In particular in case of noisy data or 
strong inter-parameter correlations the problem of finding an 
adequate initialization j:'^'** is tedious and often dominates the 
time needed to complete the optimization. In addition, the clas- 
sical algorithms are not practicable if the objective function is 
not continuously differentiable, oscillating, or the calculation 
of partial derivatives is too expensive or numerically inaccu- 
rate. 

In contrast, stochastic strategies such as evolutionary algo- 
rithms where each member of the sequence ix^^^)g is the re- 
sult of a random experiment, are not only promising for the 
purpose of global optimization but also apply to any objec- 
tive function which is computable or available by experiment. 
However, the drawback of stochastic optimization strategies is 
less efficiency: even if an adequate local downhill move exists, 
a random step will almost always lead astray. Therefore, many 
more evaluations of the objective function are required before 
the sequence (jc*^*)g converges at a minimum. 

Evolutionary algorithms are inspired by the principles of 
biological evolution. Conceptually, three major subclasses are 
discerned; evolutionary programming, genetic algorithms, and 
evolution strategies (ES). While all subclasses apply quite gen- 
erally, ES are particularly suited for the purpose of continuous 
parametric optimizatio n. LargelV | Ha nseii & Ostermei er (2001, 
hereafter Pa^er A) and lHansen etalJ J20oJ hereafter Paper B) 
have established a novel concept of completely derandomized 
self-adaption in ES which selectively approximates the inverse 
Hessian matrix of the objective function and thereby consider- 
ably improves the efficiency in case of non-separable problems 
or mis-scaled parameter mappings while even increasing the 
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chance of finding the global optimum in case of strong multi- 
modality. 

The interpretation and analysis of QSO absorption lines ba- 
sically involves the decomposition of line ensembles into in- 
dividual line profiles. In general, the decomposition of QSO 
spectra presents an ambigous parametric inverse problem and 
automatizing the decomposition demands an efficient but at 
first stable optimization algorithm. In this study we resume the 
c oncept of completely derandomized self-adaption introduced 
in IPaper Al and test the global optimization performance of the 
resulting ES when applied to the problem of line profile de- 
composition in QSO spectra. 

2. Evolution strategies (ES) 

2.1. General concepts 

The state of an ES in generation g is defined by the parental 
family of object parameter vectors x^f , . . . ,x^'^lj^ e R" and the 
mutation operator p^*' : R" R". The generic ES algorithm is 
completely defined by the transition from generation gio g+\. 
For instance, in the illustrative case of a simple single parent 
strategy: 

1 . Mutation of the parental vector x'^^^ to produce a new pop- 
ulation of A> \ offsprings 

^ p^g)x^«'^ ^N(x^«\(T^I), (2) 



te+i) (j+i) ^ 



where each off'spring is sampled from an n-dimensional 
normal mutation distribution with mean jc*^' and isotropic 
variance cr^. 

2. Selection of the best individual among the ofi^spring popu- 
lation to become the new parental vector 



(3) 



where the notation jj,:^! refers to the / best amongji, . . . ,yx 
individuals, andj?, is better thanjfy if /(j,) < fiyj)- 

In general, the transition of the parental family of object 
parameter vectors x''^\ . . . , Jcjf' from generation g to ^ + 1 is 
accomplished according to the following coherent scheme: 

1 . Recombination of p < fi randomly selected parental vec- 
tors into the recombinant vector {x^^K The recombination 
is either a definite or random algebraic operation. 

2. Mutation of the recombinant vector (jr)'^** to produce a new 
population of A> fi offsprings 



te+i) 



(4) 



where the maximally unbiased mutation operator corre- 
sponds to an n-dimensional correlated normal mutation dis- 
tribution with zero mean and covariance matrix C^^\ i.e 



(5) 



3. Selection of the fi best individuals among the ofi^spring pop- 
ulation (or the union of offsprings and parents) to become 
the next generation of parental vectors 




Fig. 1. Examples of isotropic, uncorrected, and correlated mu- 
tation distributions (indicated by circles, axis-parallel ellip- 
soids, and rotated ellipsoids, respectively) rendered over an 
elongated valley topography. The best average progress toward 
the topographic minimum in direction of the upper right corner 
is achieved in the right panel, where the mutation distribution 
(solid ellipsoid) is adapted to the topography 



In contrast to the transition of parental vectors, there is no 
coherent conceptual scheme for the transition of the mutation 
operator p^^^ from generation gtog + 1. However, due to basic 
considerations, any advanced transition scheme is expected to 
reflect the following elementary principles: 

1. Invariance of the resulting ES with respect to any strictly 
monotone remapping of the range of the objective function 
as well as any linear transformation of the object parame- 
ter space. In particular, the resulting ES is expected to be 
unaffected by translation, rotation, and reflection. 

2. Self-adaption of (the shape of) the mutation distribution to 
the topography of the objective function (Fig.Q. In partic- 
ular, the mutation distribution is expected to reproduce the 
precedently selected mutation steps with increased likeli- 
hood. 

The rigid implementation of these principles results in a 
concept of completely derandomized self-adaption where the 
transition of the mutation operator from one generation to the 
next is accomplished by successively updating the covariance 
matrix of the mutation distribution with information provided 
by the actually selected mutation step. The further demand of 
nonlocality involves the cumulation of the selected mutation 
steps into an evolution path. 

2.2. Covariance matrix adaption (CMA) 

If Zi, . . . , ZmeN G K-", m > 11 are a generating set of R" and 
qi, . . .,q„, e R are a sequence of (0, l)-normally distributed 
random variables, then 



qiZ\ + ■■■ + qmZn 



(7) 



renders an «-dimensional normal distribution with zero mean 
and covariance matrix 



ziz] + 



T 

+ ZmZff^- 



(8) 



i - 1, . . . 



(6) 



In fact, Eq. facilitates the realization of any normal distri- 
bution with zero mean. In particular, the symmetric rank-one 
matrix ZizJ corresponds to that normal distribution with zero 
mean producing the vector z, with the maximum likelihood. 
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Fig. 2. The conceptual scheme of covariance matrix adaption 
(CMA). Initially, the covariance matrix of the mutation dis- 
tribution is C*^*'^ - eicj + eze^- In the course of the transi- 
tion to the next generation g + I the symmetric rank-one ma- 



trix 



where Z^J,^'^ is the actually selected muta- 



tion step, is added to the downscaled covariance matrix. In 
the third generation, the covariance matrix is C*^^' - c^e\e[ + 
+ c2(3-')2» (2»)T ^ ^2^^i,T + ^ifj^j^T^ ^^ere bi 
and b2 are the unit eigenvectors of C^^' coiTesponding to the 
eigenvalues and t/j. The resulting mutation distribution reads 
A^(0, 0^'>) = A^(0, l)dibi+ N(0, 1) c/a^a 



Since the objective of CMA is to reproduce the precedently se- 
lected mutation steps with increased likelihood, the whole trick 
in order to accomplish this objective is to add the symmetric 
rank-one matrix 



,te+l)/,(«+l)xT 



'■sel 



(9) 



where Z^*!^'' denotes the actually selected mutation step, to the 
covariance matrix of the mutation distribution. The conceptual 
scheme is illustrated in Fig. |2| Initially, the mutation distribu- 
tion is isotropic 



N(0,C^°^) = N(0, +N(0, 1)C2, 



(10) 



whereas in the course of the transition to the next generation the 
symmetric rank-one matrix Eq. is added to the downscaled 
covariance matrix 



(11) 



In the course of the third generation, for instance, the mutation 
distribution reads 



N(0,C^^^) = N(0, l)c^ei+N(0, l)c^e2 + YjN(0, l)c^-'z 



3-.,(0 
sel 



(=1 



= N(Q,l)dibi +N(Q,l)d2b2, 



(12) 



where bi and b2 are the unit eigenvectors of C*^' coiTespond- 
ing to the eigenvalues d^ and Obviously, the mutation dis- 
tribution tends to reproduce the precedently selected muta- 
tions steps. Finally, the mutation distribution becomes station- 
ary (apart from scale) while its principal axes achieve conjugate 
perpendicularity, and C*-^* effectively approaches the scaled in- 
verse Hessian matrix of the objective function. 

2.3. Evolution path cumulation 

The efficiency as well as the stability of an ES improve signifi- 
cantly, if the decision of how to adapt the mutation distribution 
is based on the cumulation of several selected mutation steps 
rather than a single step. While the latter maximizes the local 
selection probability the former is more likely to advance the 
global progress rate. The benefit from superseding the succes- 
sively selected mutation steps in favor of the evolution path 



(1 - c)p^s) + ^c(2 - c)ztl'\ c e (0, 1] 



(13) 



is illustrated in detail in IPaoer aI If several successively se- 
lected mutation steps are parallel (antiparallel) coiTelated, the 
evolution path will be lengthened (shortened). If the evolution 
path is long (short), the size of the mutation steps in direction 
of the evolution path increases (decreases). The effect of cumu- 
lation is particularly beneficial in the case of small populations 
where the topographical information gathered within one gen- 
eration is not sufficient. If c = 1, no cumulation will occur and 



„(«+!) 



= Z 



te+1) 



2.4. The CMA evolution strategy (CMA-ES) 

In this section we compile a unified formulation of the generic 
CMA-ES algorithms given in Papers A and B. In this form, 
the algorithm is not presented elsewhere. Besides the rank-one 
CMA outlined in the previous sections the algorithm features 
the advanced-rank CMA introduced in Paper B and an addi- 
tional step size control. Finally, we conclude with remarks con- 
cerning the numerical implementation of the CMA-ES algo- 
rithm we provide online. 



2.4.1. Generic algorithm 

The state of the (ju, /l)-CMA-ES in generation g is defined 



by the family of parental vectors x 



(if) 



the co- 



variance matrix of the mutation distribution C'** e W^", the 
global mutation step size cr*^' e R^, and the evolution paths 
pf\p"a G R"- The generic algorithm is completely defined by 
the transition from generation gio g + 1 : 

1. Weighted intermediate recombination' of the parental vec- 
tors into the recombinant vector 



f ... 

I WiX. 



(14) 



' Discrete recombination such as the cross-over commonly prac- 
ticed in genetic algorithms is not invariant with respect to linear trans- 
formations of the search space. 
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The weights wi,...,w^ are internal strategy parameters 
with canonical values 



Wi - ln(fi + 1) - ln(/), i - 1, . . . ,;U. 



(15) 



The initial recombinant vector (jr)*"' is expected to enable 
the resulting mutation operator to sample the relevant part 
of the object parameter space. 
2. Mutation of the recombinant vector to produce a new fam- 
ily of A > fi offsprings 



(16) 



where z^f^^\ . . . , Z^*^'* 6 are a family of (0, /)-normally 
distributed random vectors (i.e. the vector components are 
(0, l)-normally distributed), and D'-s\ B^) e R"''" diagonal- 
ize the covariance matrix 



(17) 



The elements of the diagonal matrix D*^^* are the square 
roots of the eigenvalues of C* "^*, while the column vectors of 
B'-^^ are the corresponding unit eigenvectors. Constrained 
optimization can be realized by resampling z'f^^\ . . . , Z*f 
until the constraints are satisfied. 
3. Selection of the fi best individuals among the offspring pop- 
ulation to become the next generation of parental vectors 



/ = 1, 



(18) 



Since the production of offsprings is independent of order 
the values of the objective function 



can be computed in parallel. 
4. Cumulation of the distribution evolution path 



(19) 



pT'^ = (1 - c,)pf^ + c„ Vcc(2-q)B(^>Z)(^><z)<^^+" (20) 
with 



Cw — 



(21) 



and 

iz)' 



,(S+1) ^ ^i=l ^'^i-^ 



(22) 



where Z;^^^' follows fromj;^^^ by Eq. M6\ . The distribution 
cumulation rate q e (0, 1] is an internal strategy parameter 
with canonical value 



Cc = 



n -1-4 



(23) 



If Cc = 1, no cumulation will occur. Initially, the distribu- 



tion evolution path is p^^^ - 0. 
5. Adaption of the covariance matrix 

C(«-l) = (l-Ceov)C(«^+Ccov(acovPr'^(/^^«"'')^ 

+ (1 - aeov)<Z)*^+'>) , (24) 



where 



WiB^s)B^sht,'^ (Bfe)i)fe)z*f;")' 



(25) 



The expressions Pc^^'Vpc^^'')^ and {ZY^^^^ are symmetric 
matrices of rank one and min(ju, n), respectively, and gener- 
alize the conceptual scheme illustrated in Fig.|3in case of 
a single parent ES. The parameter acov € [0, 1] combines 
the two adaption mechanisms whereas Ccov £ [0, 1) mod- 
erates the adaption rate. Both numbers are internal strategy 
parameters with canonical values 



and 



2ac, 



+ (1 - Qfcov) min i. 



2cl - 1 



(26) 



(27) 



(n+ V2)2 ■ ^' ^ \"(n + 2)2 + 4y 

If Ccov = 0, no adaption will occur. Initially, C**'^ = / or 
the square of any diagonal matrix properly scaling the op- 
timization problem. 

Cumulation of the step size evolution path 

pT'^ = (1 - c^)p^«^ + c„, ^JcA2-c^)B^'\z)^^'^'\ (28) 

where the step size cumulation rate Co- e (0, 1] is an internal 
strategy parameter with canonical value 



-h2 



n + cl + 3 



(29) 



If Co- - 1, no cumulation will occur. Initially, the step size 



evolution path is p^^* = 0. 
7. Adaption of the global mutation step size 

(c^\\pr''\\-E„] 



cr*^' exp 



Uq- l^fi 



(30) 



where the second fraction is the relative deviation of the 
length of pIj^^^ from its expected value in the case of no 
selection pressure, E„ — V2r(^)/r(|), and the damp- 
ing constant da- > Cq- is an internal strategy parameter with 
canonical value 



1 H- Co- -I- 2 max 



0, 



1 



n + 1 



1 



(31) 



The initial mutation step size cr^"^ is expected to enable the 
resulting mutation distribution to sample the relevant part 
of the object parameter space. 

The impact and canonical se tting of i nte rnal strategy pa- 
ramete rs are discussed in detail in lPaoer aI and lHansen & KernI 
ll2004 . 



2.4.2. Numerical implementation 

The numerical implementation of the CMA-ES algorithm is ba- 
sically straightforward.^ We coded different function templates 

- Sophisticated examples are given on the home page of Nikolaus 
Hansen at http : //www . bionik . tu-berlin . de 
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for both unconstrained and constrained optimization. The tem- 
plate instantiation requires a random number generator and an 
eigenvalue decomposition algorithm to be supplied as generic 
arguments. The numerical code is designed to perform in par- 
allel when running on shared memory mul tiprocessing archi- 
tectur es a nd has been applied in practice bv lOuast et alJ ll2Q02l 
l2004) and lReimers et al.1 ll2003h .^ 

Since an adequate random number generator producing 
high-dimensional equidistribution is absolutely essential to en- 
sure the ES practically performs as expected in theory, we ap- 
ply the Mersenne Twister algorithm furnishing equidistributed 
un iform deviates in up to 623 dimensions with period 2'^^"*' - 
1 dMatsumoto & Nishlmura.199 8). The uniform deviates are 
converted into normal deviates by means of the polar method. 

The Linear Algebra Package provides several algorithms 
suitable for diagonalizing the covariance matrix. For exam- 
ple, the routines decomposing tridiagonal matrices into rela- 
tively robust representations accurately complete the diagonal- 
ization of an n x n symmetric tridiagonal matrix in O(n^) rather 
than t he regular 0{n^) arithmetic operations (Dhillon & Parlett 
I2OO4I) . In particular, it is feasible to diagonalize the covariance 
matrix jus t every n /lQ generation to minimize the numerical 
overhead jPaper Ah . For, say, n > 10000 using completely 
correlated mutation distributions will be impracticable and an 
ES algorithm calculating just the tridiagonal covariance ma- 
trix elements or adapting just individual step sizes (B'-^^ - I, 
C'-^^ - £)*^'£)*^' diagonal) will be appropriate. 



3. Spectral decomposition 

In the case of pure line absorption, the observed spectral flux 
F{A) is modelled as the product of the continuum background 
and the instrumentally convoluted absorption term 



Defining 



(32) 



(33) 



and approximating the local continuum background by a linear 
combination of Legendre polynomials, Eq. (I32> transforms into 



FiA) = Y,'^kL,[m]AiA), 



(34) 



where denotes the Legendre polynomial of order k, and <p is 
a linear map onto the interval [-1,1]. The instrumental profile 
P(^) is modelled by a normalized Gaussian defined by the spec- 
tral resolution of the instrument. The instrumental convolution 
can be calculated piecewise analytically while approximating 
the absorption term with a polyline or a cubic spline. Without 
significant loss of accuracy, the integration can be restricted to 
the interval |^| < 26, where 6 is the full width at half maximum 
of the instrumental profile. 



^ The source code is publicly available on the home page of RQ at 
http : / /www . hs . uni -hamburg . de 



Presuming pure Doppler broadening, the optical depth t is 
modelled by a superposition of Gaussian functions. If A/, fi, Zi, 
bi, Ni, and A,. - {1 + Zi)Ai denote, respectively, the rest wave- 
length, the oscillator strength, the cosmological redshift, the 
line broadening velocity, the column density, and the observed 
wavelength corresponding to line /, then 



T(/l) = 2g,(/l) 



with 



exp 



c A- A,, 
'bi A,, 



(35) 



(36) 



If natural broadening is important, the Gaussian functions will 
have to be replaced with Voigt functions. The latter can ef- 
ficiently be calc ulated by using pseudo- Voigt approximations 
llTdaetalJ200(ll). 

Taking the proper identification of lines for granted, the 
decomposition of an absorption line spectrum into individual 
profiles is a parametric inverse problem involving a tuple of 
three model parameters per line: position, broadening velocity, 
and column density. Given an observed set of spectral fluxes 
Fi,F2, ■ ■ ■ and normally distributed errors a-\ , 1x2, . . . sampled 
at wavelengths Ai,A2,--- the optimal parametric decomposition 
F(A) is calculated by minimizing the normalized residual sum 
of squares 



(37) 



For any A(A), the minimization with respect to the coefficients 
Q presents a linear optimization problem 



RSS 



Fj-Y.kCkU[cl>(Aj)]A(Aj) 



(38) 



which is solved directly by means of Cholesky decomposition. 
In the case of normally distributed errors the minimum RSS 
maximizes the likelihood. 

Note that since the normalized RSS is invariant with respect 
to permutations of line parameter 3-tuples, any self-adaptive 
ES initially will require several generations to adapt to the sym- 
metry. Any further global degeneracies of the search space will 
lengthen the initial adaption phase. 

4. Performance tests 

4.1. Test cases 

We have synthesized four exemplary test cases to investigate 
the global optimization performance of the CMA-ES when 
applied to the problem of spectral decomposition. The test 
cases are based on the characteristics of real QSO spectra and 
present a series of metal line ensembles increasing in complex- 
ity. Test cases A, B, C, and D render superpositions of six, 
eight, ten, and twelve components, respectively (Fig. |3}- All 
test cases are synthesized simulating an instrumental resolu- 
tion of ^ = 60 000 and a curved background continuum. Both 
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Fig. 3. Aitificial test cases based on an ensemble of intergalactic Ca ii /13935 lines toward the QSO HE 05 15-4414. The simulated 
instrumental resolution is 7? = 60 000 while the simulated signal-to-noise ratio is about 50. Individual components are marked by 
vertical hnes. The positional search space is indicated by solid dots 



Poissonian and Gaussian white noise are added producing a 
random continuum noise level of about two percent. Since ES 
are not destabilized by small scale oscillations of the objective 
function the complexity of the decomposition problem does not 
depend on the noise level of the data but is solely determined 
by the number and separation of spectral lines. The parameter- 
ization of the artificial lines used to synthesize the test cases is 
listed in Table ^ Several lines are barely separated or occcur 
close to the limit of the simulated spectral resolution. 

The characteristics of the test cases are motivated by the 
analysis of QSO spectra recorded with the UV- Visual Echelle 
Spectropgraph installed at the Very Large Telescope. Further 
test cases exhibiting different characteristics will emerge in the 
future in course of the analysis of QSO spectra recorded with 
the Coude Echelle Spectrograph operated at the ESO 3.6 m 
telescope. We do not consider line ensembles consisting of less 
than six components since these cases normally pose no chal- 
lenge to the CMA-ES algorithm. Primitive cases exhibiting just 
a single component are completed in less than 50 generations 
using the canonical (1, 10)-CMA-ES. 



4.2. Algorithm application 

We compare the global performance of the CMA-ES to that 
of several classical optimization algorithms provided by the 
^Nu merical RecipesI collection: Fletcher-Reeves-Polak-Ribiere 
(FRPR, a conjugate gradient variant), Broyden-Fletcher-Gold- 
fab-Shanno (BEGS, a virtual metric variant), Levenberg-Mar- 
quardt, Powell (a direction set variant), and the downhill sim- 
plex. The two latter algorithms do not require the calculation of 
partial derivatives and serve as direct standards of comparison. 

The positional search space is restricted to an interval be- 
ing 40 percent wider than the separation of the outer com- 
ponents (Fig. O, while the broadening velocities and col- 
umn densities are confined to 1.0 < b (kms ') < 10.0 and 
10.0 < logNicmr^) < 14.0. Since the classical algorithms are 
not prepared to han dle simple parametric bounds per se, the 
'Numerical Recioes' routines are modified in such a way that 
any step attempting to escape is suppressed. The partial deriva- 
tives required by the FRPR, BEGS, and Levenberg-Marquardt 
algorithms are calculated numerically using the symmetrized 
difference quotient approximation. 
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Table 1. Parameterization of the artificial lines used to syn- 
thesize the test cases rendered in Fig. |3] Rest wavelengths 
and oscillator strengths comply with the Ca ii /13935 transition. 
All CMA-ES runs converging at the global minimum result in 
the same parameterization indicated in parantheses, along with 
standard deviations provided by the diagonal elements of the 
scaled covariance matrix 



Line 



A-1 
A-2 
A-3 
A-4 
A-5 
A-6 



B-1 
B-2 
B-3 
B-4 
B-5 
B-6 
B-7 
B-8 



C-1 
C-2 
C-3 
C-4 
C-5 
C-6 
C-7 
C-8 
C-9 
C-10 



D-1 

D-2 

D-3 

D-4 

D-5 

D-6 

D-7 

D-8 

D-9 

D-10 

D-11 

D-12 



b (km s ' ) 



log W (cm -) 



150800 (800) 
150950 (949) 
151120 (119) 
151290 (290) 
151490 (488) 
151570 (569) 



3.0 (2.9 ±0.1) 
6.0 (6.0 ±0.2) 
2.5 (1.9 ±0.2) 
5.0 (4.8 ± 0.3) 
3.0 (2.8 ±0.3) 
4.5 (4.6 ± 0.2) 



12.30 
11.80 
11.60 
11.50 
11.60 
12.00 



150700 
150850 
151100 
151190 
151280 
151370 
151490 
151580 



(700) 
(850) 
(101) 
(189) 
(279) 
(370) 
(490) 
(580) 



3.0 (2.9: 

6.0 (6.0: 

2.5 (2.5 : 

4.0 (4.0: 

5.5 (5.6: 
3.0 (3.2: 
4.5 (4.4 : 
3.5 (3.5: 



0.1) 
0.2) 
0.2) 
0.5) 
0.7) 
0.3) 
0.2) 
0.1) 



12.30 
11.80 
11.70 
11.50 
11.60 
11.60 
11.80 
12.20 



150610 
150700 
150820 
151000 
151170 
151260 
151360 
151420 
151500 
151580 



(610) 
(701) 
(821) 
(000) 
(170) 
(261) 
(358) 
(421) 
(500) 
(579) 



3.0 (2.9: 
5.5 (5.4: 
3.0 (2.8: 
2.5 (2.5 : 
4.0 (3.9: 
5.5 (5.9: 
3.5 (3.2: 
4.0 (3.9: 
3.5 (3.3: 
6.0 (6.1 : 



0.1) 
0.3) 
0.3) 
0.4) 
0.3) 
0.8) 
0.8) 
0.8) 
0.4) 
0.2) 



12.30 
11.80 
11.60 
11.40 
11.80 
11.60 
11.50 
11.70 
11.90 
12.20 



150600 
150700 
150810 
150970 
151100 
151200 
151270 
151350 
151420 
151500 
151560 
151650 



(600) 
(700) 
(810) 
(968) 
(098) 
(203) 
(270) 
(347) 
(418) 
(500) 
(562) 
(651) 



3.0 (3, 
5.5 (5, 
3.0 (3, 
2.5 (2, 
4.0 (4, 
5.5 (5, 
3.0 (2, 
4.5 (4, 
3.5 (4, 
2.5 (2, 
5.0 (5, 
4.0 (3, 



0.1) 
0.2) 
0.2) 
0.2) 
0.2) 
1.0) 
0.8) 
0.6) 
0.8) 
0.3) 
0.7) 
0.1) 



12.30 
11.80 
11.70 
11.60 
11.70 
11.60 
11.50 
11.80 
11.60 
12.00 
11.90 
12.20 



12.32: 
11.80: 
11.60: 
11.51 : 
11.58: 
12.01 : 



0.01) 

0.01) 
0.02) 
0.02) 
0.02) 
0.01) 



12.32: 
11.80: 
11.69: 
11.50: 
11.61 : 
11.60: 
11.80: 
12.20: 



0.01) 
0.01) 
0.01) 
0.03) 
0.03) 
0.02) 
0.01) 
0.01) 



12.32: 
11.79: 
11.59: 
11.38: 
11.79: 
11.61 : 
11.51 : 
11.71 : 
11.90: 
12.21 : 



0.02) 
0.02) 
0.02) 
0.02) 
0.02) 
0.04) 
0.07) 
0.05) 
0.03) 
0.01) 



12.30: 

11.81 : 

11.70: 
11.59: 
11.70: 
11.60: 
11.47: 
11.77: 
11.69: 
12.00: 
11.93: 
12.19: 



0.01) 
0.01) 
0.01) 
0.01) 
0.02) 
0.06) 
0.08) 
0.05) 
0.06) 
0.02) 
0.04) 
0.01) 



Regarding the CMA-ES, we apply an (100, 200) algorithm 
with internal strategy parameters preset to the canonical val- 
ues except for a^ov = and an increased covariance matrix 
adaption rate Ccov The diagonal elements of C^"^ are initialized 
to the squared half widths of the parameter intervals while the 
mutation step size is initialized to cr*"' = 0.5. 

For each algorithm and test case we monitor the history 
of 100 randomly initialized optimization runs. Providing the 
true number of absorption components as prior information, the 
initial object parameters are randomly drawn from a uniform 
distribution. The random noise is generated only once for each 



test case and is the same for all runs. All optimization runs are 
stopped after 100000 evaluations of the normalized RSS. 

4.3. Test results 

The optimization runs are evaluated in terms of the ratio of the 
final normalized RSS to the normalized RSS corresponding to 
the true parameterization. For all test cases the RSS ratio is 
marginally less than unity at the global minimum. In general, 
an RSS ratio of unity just indicates the statistical consistency 
of the optimized and the true absorption profiles, but does not 
indicate the consistency of the optimized and the true param- 
eterization. In particular in the case of QSO spectra exhibit- 
ing lower signal-to-noise ratios inconsistent optimized param- 
eters will occur regularly. But however, since in our test cases 
the true parameterization is recovered by all runs converging 
at the global minimum (Table the RSS ratio is appropriate 
to assess the global optimization performance. The outcome of 
optimization runs is summarized in Fig. |3 and Table |2l while 
Fig- m illustrates the performance of the CMA-ES during dif- 
ferent stages of the optimization. In the following paragraphs 
the test results are described in detail. 



4.3.1. Case A 

Case A consists of four isolated and two blended components. 
The lowest local minimum occurs when the weaker line of the 
blend is missed, but an isolated line is hit twice instead. Higher 
local minima occur whenever the blend is modelled correctly 
but any isolated component is missed. The CMA-ES hits the 
global minimum in 56 and the lowest local minimum in 38 
runs. Another six runs converge at higher local minima while 
missing an isolated component. The most stable classical tech- 
nique is the Powell algorithm, aiTiving at the global minimum 
in six, at the lowest local minimum in nine, and between these 
both in twelve runs. The most successful gradient technique is 
the Levenberg-Marquardt algorithm, reaching the global mini- 
mum in three runs. The latter algorithm also is the fastest, need- 
ing about 1000 evaluations of the normalized RSS to locate the 
global minimum, whereas the Powell and CMA-ES algorithms 
require about 14000 and 18 000 evaluations, respectively. The 
FRPR and simplex algorithms hit the global minimum in two 
runs each, needing about 36000 and 68 000 RSS evaluations, 
respectively. Except for the Powell algorithm the majority of 
deterministic runs misses two or more components. The BEGS 
algoiithm never hits any component. 

4.3.2. Case B 

Case B is similar to case A but exhibits two additional compo- 
nents and a less naiTow blend. The CMA-ES hits the global 
minimum in 77 and the lowest local minimum in 15 runs. 
Another eight runs converge at higher local minima. No de- 
terministic algorithm locates the global minimum in more than 
three runs. The Powell algorithm is still relatively stable, miss- 
ing one or two components in the majority of runs. 
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Fig. 4. Outcome of 1 00 randomly initialized optimization runs using the CMA-ES (first row of histograms), Levenberg-Marquardt 
(second row), and Powell algorithms. The number of runs converging at the global minimum is indicated by the leftmost bar. The 
FRPR, BFGS, and simplex algorithms perform worse and are omitted for convenience 

Table 2. Global optimization performance summarized in terms of the median of the final normalized RSS ratio, the number of 
runs converging at the global minimum, and the median number of RSS evaluations needed for that 



Algorithm 


Case A 




Case B 




CaseC 




CaseD 


CMA-ES 


0.97 


56 


18 000 


0.99 


77 


40000 


1.04 


32 


62000 


1.09 27 82 000 


FRPR 


4.86 


2 


36000 


9.45 


3 


70000 


6.65 






11.7 - - 


BFGS 


26.8 






32.9 






34.7 






37.5 - - 


Levenberg-Marquardt 


3.97 


3 


1000 


6.11 






4.24 






5.04 - - 


Powell 


2.12 


6 


14000 


2.67 


2 


20000 


2.05 


2 


26000 


2.69 - - 


Simplex 


12.2 


2 


68 000 


25.6 






6.96 






35.8 - - 



4.3.3. Case C 



4.3.4. Case D 



Case C again exhibits two additional components resulting in 
an ensemble of largely blended lines. The lowest local mini- 
mum occurs when the seventh component is missed, but an- 
other component is hit twice instead. The global minimum of 
the objective function almost has degenerated, being different 
from the lowest local minimum merely by seven percent. The 
CMA-ES hits the global minimum in 32 and the lowest local 
minimum in 22 runs. The majority of the remaining runs con- 
verges at higher local minima while missing a different one 
than the seventh component. The Powell algorithm hits the 
global as well as the lowest local minimum in 2 runs each and 
misses one or two components in the majority of runs. No fur- 
ther classical algorithm hits the global minimum. 



Case D is very similar to case C but exhibits two additional 
components. The CMA-ES hits the global minimum in 27 runs. 
The majority of runs aiTives at lower local minima without 
becoming stationary (Fig. |5}. No classical algorithm hits the 
global minimum. 



4.3.5. Efficiency 

The number of RSS evaluations required by the CMA-ES to 
converge at the global minimum increases linearly with the 
number of lines superimposed in the test case (Table |2j- The 
linear congelation coefficient is about a - 10000 RSS evalua- 
tions per line. In case of the Powell algorithm the linear corre- 
lation coefficient is about a = 3000. 
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Fig. 5. History of 100 randomly initialized optimization runs using the CMA-ES. The number of runs converging at the global 
minimum is indicated by the leftmost bar. The number of RSS evaluations is calculated by multiplying the generation number 
with 200 



5. Summary and conclusions 

All tested classical algorithms require an adequate initial guess 
to locate the global optimum of the objective function when 
applied to the problem of spectral decomposition. In contrast, 
the CMA-ES is demonstrably capable to calculate the optimal 
decomposition without demanding any particular initialization, 
and therefore is just asking to be used in automatized spectro- 
scopic analysis software. 

The CMA-ES does not guarantee to find the optimal so- 
lution: characteristic spectral features are modelled correctly, 
but features exhibiting just a small attractor volume such as 
narrow components or tight blends are frequently not distin- 
guished fittingly. Larger populations are expected to improve 
the chance to hit the global optimum but require a greater num- 
ber of objective function evaluations. Peak finding algorithms 
could detect both the proper number (a problem that we have 
completely left aside) and position of spectral components and 
provide an improved initialization. But irrespective of whether 
an automatized analysis software will be realized, the CMA-ES 



is an elegant and highly competitive general purpose algorithm 
which is easy to implement. In our opinion, its integration into 
the standard astronomical data analysis packages will be thor- 
oughly worthwhile and its widespread use will contribute to 
the further comprehension and improvement of suchlike algo- 
rithms. 
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